神经辐射场(NERF)具有密集捕获的输入图像实现光真实的视图合成。然而,鉴于稀疏的视图,NERF的几何形状极为严重,从而导致新观点合成质量的显着降解。受到自我监督的深度估计方法的启发,我们提出了structnerf,这是针对稀疏输入的室内场景的新型视图合成的解决方案。 structnerf利用自然嵌入多视图输入中的结构提示来处理NERF中无约束的几何问题。具体而言,它分别解决了纹理和非纹理区域:提出了基于贴片的多视图一致的光度损失来限制纹理区域的几何形状;对于非纹理的,我们明确地将它们限制为3D一致的平面。通过密集的自我监督深度约束,我们的方法可以改善NERF的几何形状和视图综合性能,而无需对外部数据进行任何其他培训。在几个现实世界数据集上进行的广泛实验表明,构造者超过了针对室内场景的最新方法,这些方法具有稀疏输入的定量和定性。
translated by 谷歌翻译
通过深度传感器捕获的点云通常被噪音污染,阻碍了进一步的分析和应用。在本文中,我们强调了点分布均匀性对下游任务的重要性。我们证明了现有基于梯度的DeNoiser产生的点云尽管取得了有希望的定量结果,但仍缺乏统一性。为此,我们提出了GPCD ++,这是一种基于梯度的DeNoiser,其超轻质网络名为UNINET,以解决均匀性。与以前的最先进方法相比,我们的方法不仅会产生竞争性甚至更好地降解结果,而且还显着改善了统一性,这在很大程度上使诸如表面重建之类的应用受益。
translated by 谷歌翻译
中心位置是否完全能够代表像素?在离散的图像表示中表示具有它们的中心的像素的错误,但是在图像超分辨率(SR)上下文中的局域脉中的信号的聚合时,它更有意义地考虑每个像素。尽管任意级图像SR领域的基于坐标的隐式表示的能力很大,但该区域的像素的性质不完全考虑。为此,我们提出了集成的位置编码(IPE),通过聚合在像素区域上聚合频率信息来扩展传统的位置编码。我们将IPE应用于最先进的任意级图像超分辨率方法:本地隐式图像功能(LIIF),呈现IPE-LIIF。我们通过定量和定性评估显示IPE-LIIF的有效性,并进一步证明了IPE泛化能力与更大的图像尺度和基于多种隐式的方法。代码将被释放。
translated by 谷歌翻译
我们呈现NERF-SR,一种用于高分辨率(HR)新型视图合成的解决方案,主要是低分辨率(LR)输入。我们的方法是基于神经辐射场(NERF)的内置,其预测每点密度和颜色,具有多层的射击。在在任意尺度上产生图像时,NERF与超越观察图像的分辨率努力。我们的关键识别是NERF具有本地之前的,这意味着可以在附近区域传播3D点的预测,并且保持准确。我们首先通过超级采样策略来利用它,该策略在每个图像像素处射击多个光线,这在子像素级别强制了多视图约束。然后,我们表明,NERF-SR可以通过改进网络进一步提高超级采样的性能,该细化网络利用估计的深度来实现HR参考图像上的相关补丁的幻觉。实验结果表明,NERF-SR在合成和现实世界数据集的HR上为新型视图合成产生高质量结果。
translated by 谷歌翻译
神经辐射场(NERF)使用基于坐标的神经场景表示实现了前所未有的视图合成质量。然而,NERF的视图依赖项只能处理像亮点的简单反射,而是无法处理复杂的反射,例如来自玻璃和镜子的复杂反射。在这些方案中,NERF将虚拟映像模拟为实际几何形状,这导致了不准确的深度估计,并且当违反多视图一致性时产生模糊渲染,因为只有在一些视点下只能看到反射对象。为了克服这些问题,我们介绍了nerfren,它建在nerf,以模拟思考的场景。具体地,我们建议将场景分成传输和反射的组件,并模拟具有单独的神经辐射场的两个组件。考虑到这种分解是高度限制的,我们利用几何前瞻,并仔细设计的培训策略,以实现合理的分解结果。各种自捕获场景的实验表明,我们的方法实现了高质量的新颖观看合成和物理声音深度估计结果,同时启用场景编辑应用。代码和数据将被释放。
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
It has been observed in practice that applying pruning-at-initialization methods to neural networks and training the sparsified networks can not only retain the testing performance of the original dense models, but also sometimes even slightly boost the generalization performance. Theoretical understanding for such experimental observations are yet to be developed. This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. Specifically, this work considers a classification task for overparameterized two-layer neural networks, where the network is randomly pruned according to different rates at the initialization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization performance. More surprisingly, the generalization bound gets better as the pruning fraction gets larger. To complement this positive result, this work further shows a negative result: there exists a large pruning fraction such that while gradient descent is still able to drive the training loss toward zero (by memorizing noise), the generalization performance is no better than random guessing. This further suggests that pruning can change the feature learning process, which leads to the performance drop of the pruned neural network. Up to our knowledge, this is the \textbf{first} generalization result for pruned neural networks, suggesting that pruning can improve the neural network's generalization.
translated by 谷歌翻译
Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.
translated by 谷歌翻译
As an important variant of entity alignment (EA), multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) with multiple modalities like images. However, current MMEA algorithms all adopt KG-level modality fusion strategies but ignore modality differences among individual entities, hurting the robustness to potential noise involved in modalities (e.g., unidentifiable images and relations). In this paper we present MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid, to dynamically predict the mutual correlation coefficients among modalities for instance-level feature fusion. A modal-aware hard entity replay strategy is also proposed for addressing vague entity details. Extensive experimental results show that our model not only achieves SOTA performance on multiple training scenarios including supervised, unsupervised, iterative, and low resource, but also has limited parameters, optimistic speed, and good interpretability. Our code will be available soon.
translated by 谷歌翻译
The task of video prediction and generation is known to be notoriously difficult, with the research in this area largely limited to short-term predictions. Though plagued with noise and stochasticity, videos consist of features that are organised in a spatiotemporal hierarchy, different features possessing different temporal dynamics. In this paper, we introduce Dynamic Latent Hierarchy (DLH) -- a deep hierarchical latent model that represents videos as a hierarchy of latent states that evolve over separate and fluid timescales. Each latent state is a mixture distribution with two components, representing the immediate past and the predicted future, causing the model to learn transitions only between sufficiently dissimilar states, while clustering temporally persistent states closer together. Using this unique property, DLH naturally discovers the spatiotemporal structure of a dataset and learns disentangled representations across its hierarchy. We hypothesise that this simplifies the task of modeling temporal dynamics of a video, improves the learning of long-term dependencies, and reduces error accumulation. As evidence, we demonstrate that DLH outperforms state-of-the-art benchmarks in video prediction, is able to better represent stochasticity, as well as to dynamically adjust its hierarchical and temporal structure. Our paper shows, among other things, how progress in representation learning can translate into progress in prediction tasks.
translated by 谷歌翻译